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^ (57) Abstract: Provided is a system and method for broadcasting stereoscopic video data to users on the Internet based on Moving 
© Picture Experts Group (MPEG)-4. The system includes: an encoding server for receiving stereoscopic video data, audio data, and 
O Object Descriptor/Binary Format for Scene (OD/BIFS), which is information for controlling a content, and encoding the data into 
elementary stream (ES) having an MPEG-4 structure; a web server for receiving from the client any one among two-dimensional 
video display mode, field-shuttering video display mode and frame-shuttering video display mode; and a streaming server for gener- 
ic ating a RTP (RTP) packet for real-time data transmission on the Internet by multiplexing the ES based on the display mode inputted 
into the web server, and transmitting the RTP packet to the client. 
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Description 

SYSTEM AND METHOD FOR INTERNET 

BROADCASTING OF MPEG-4-BASED STEREOSCOPIC 

VIDEO 
Technical Field 

[1] The present invention relates to a Web broadcasting system and method; and, more 

particularly, to a system and method for broadcasting a stereoscopic video to users on 
the Internet based on Moving Picture Experts Group (MPEG)-4. 

Background Art 

[2] 'A Stereoscopic video 1 means a moving picture that is produced by receiving and 

outputting left-eye data and right-eye data alternately to give three-dimensional far and 
near distance effect to two-dimensional planes. 

[31 Along with the recent development of the Internet, diverse nultimedia data in a 

field of education, culture, current issues and the like are provided to Internet users. 
Internet users can watch and/or listen to nultimedia data they want at any time at any 
place as long as they have clients connected to the Internet. 

[4] Generally, Internet broadcasting systems, which are also referred to as Internet 

broadcasting systems, are formed of an encoding server for encoding nultimedia data 
based on a predetermined encoding method, a streaming server for transmitting the 
nultimedia stream, and clients for decoding and outputting the transmitted nultimedia 
stream . 

[5] Hg. 1 is a block diagram illustrating a typical Internet broadcasting system. As 

shown, video data and audio data are inputted from a video/audio input device 10, such 
as a video camera, and compressed as they pass through an encoding server 20. 

[6] The MPEG is a group of moving picture experts that is formed to establish the 

standards for moving picture encoding methods. The MPEG studies about moving 
picture compression that varies continuously based on time and about the transmission 
of coded data. The MPEG suggests international encoding standards and current 
Internet broadcasting is performed based on the standards. Particularly, MPEG-1 and 
MPEG 2 are international standards that are used for compressing and storing large 
volume nultimedia data. 

[7] A streaming server 30 transmits the nultimedia stream, which is encoded by the 

encoding server 20, to clients 50 through the Internet 40. Then, the clients 50 decode 
the transmitted nultimedia stream. The clients 50 should have a player with a codec to 
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output the miltimedia data. 

In the meantime, some problems may occur when the stereoscopic video data are 
transnitted using conventional encoding methods and current Internet broadcasting 
system. Since left-eye images and right-eye images should be encoded separately to 
transnit stereoscopic video data to the clients through the Internet, the amDunt of data 
is increased more than twice and the probability of transmission error becomes higher 
due to the load of transrrission traffic. Moreover, there is a problem that the clients 
should discriminate between the left-eye images and the right-eye images in order to 
decode them and output them synchronized with each other temporally. If the left-eye 
images and the right-eye images are not outputted alternately, a three-dimensional 
effect cannot be obtained, only to cause eye-fatigue of viewers. 

Therefore, a new encoding method, other than conventional encoding methods, is 
required to broadcast stereoscopic video data on the Internet as well as an Internet 
broadcasting system and method coinciding with the encoding method. 
Disclosure of Invention 

Technical Solution 

It is, therefore, an object of the present invention to provide a system and method 
for broadcasting stereoscopic video data on the Internet by encoding and multiplexing 
miltimedia data based on a structure of Moving Picture Experts Group-4 (MPEG-4) 
temporal scalability (TS). 

It is another object of the present invention to provide an Internet broadcasting 
system and method that can broadcast conventional two-dimensional video data on the 
Internet. 

In accordance with one aspect of this invention, there is provided a system for 
broadcasting stereoscopic video data to a client on the Internet, including: an encoding 
server for encoding stereoscopic video data, audio data, and Object Descriptor/Binary 
R>rmat for Scene (OD/BIFS), which is information for controlling a content, and 
encoding the data into elementary stream (ES) having an MPEG-4 structure; a web 
server for receiving from the client any one among two-dimensional video display 
mode, field-shuttering video display mode and frame-shuttering video display irode; 
and a streaming server for generating a real-time transport protocol (RTP) packet for 
real-time data transmission on the Internet by nultiplexing the ES based on the display 
mode inputted into the web server, and transmitting the RTP packet to the client. 

In accordance with one aspect of the present invention, there is provided a method 
for broadcasting stereoscopic video data to a client on the Internet based on MPEG-4, 
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including the steps of: a) receiving stereoscopic video data, audio data, and OD/BIF5 
data, which is information for controlling a content, and encoding the data into ES 
having an MPEG-4 structure; b) receiving any one among two-dimensional video 
display mode, field-shuttering video display mode and frame-shuttering video display 
mode from the client; and c) generating an RTP packet for real-time transmission on 
the Internet by nultiplexing the ES based on the inputted display mode, and 
transmitting the RTP packet to the client. 

Description of Drawings 

The above and other objects and features of the present invention will become 
apparent from the following description of the preferred embodiments given in 
conjunction with the accompanying drawings, in which: 

Hg. 1 is a block diagram illustrating a typical Internet broadcasting system; 

Hg. 2 is a block diagram depicting an Internet broadcasting system in accordance 
with a preferred embodiment of the present invention; 

Eg. 3 is a block diagram showing an encoding server of Eg. 2 in detail; 

Eg. 4 is a block diagram showing an encoder of Eg. 3 in detail; 

Eg. 5 is a diagram showing a video data inputted into each layer of a Moving 
Picture Experts Group-4 (MPEG-4) structure in accordance with the preferred 
embodiment of the present invention; 

Eg. 6 is a block diagram illustrating an MPEG 4 (MP4) file generator of Eg. 3 in 
detail; 

Egs. 7 and 8 are diagrams describing arrangements of elementary stream (ES) of an 
MP4file; 

Eg. 9 is a block diagram illustrating a streaming server of Eg. 2 in detail; and 
Eg. 10 is a diagram depicting a packing transformation process in the streaming 
server. 

Mode for Invention 

Other objects and aspects of the invention will become apparent from the following 
description of the embodiments with reference to the accompanying drawings, which 
is set forth hereinafter. The terms and words used in the present specification and 
claims should not be construed as conventional or dictionary meaning, but they should 
be construed as concepts and meanings fit in with the technological concept of the 
present invention based on a principle that inventors could define the concept of terms 
properly to describe the invention most appropriately. Accordingly, the embodiment 
and drawings of the present specification are no more than one of the preferred em- 
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bodiments and do not represent all the technological concept of the present invention. 
In the respect, there may be various equivalents and modifications that can replace the 
elements illustrated in the specification as of the filing of the present patent ap- 
plication. 

Hg. 2 is a block diagram depicting an Internet broadcasting system in accordance 
with a preferred embodiment of the present invention. As shown, nultimedia data (i.e., 
stereoscopic video data or audio data) or content-controlling Object Descriptor/Binary 
format for Scenes (OD/BIFS) data obtained from a stereoscopic video camera or a 
video/audio input device 100, such as video tape recorder (VTR), are inputted into an 
encoding server 200. Then, the encoding server 200 encodes the inputted signals based 
on Moving Picture Experts Group-4 (MPEG-4). An elementary stream (ES) obtained 
by encoding the signals in the encoding server 200 is transmitted to a streaming server 
300. 

To encode the stereoscopic video, the present invention uses an MPEG-4 temporal 
scalability (TS). MPEG-4 TS is a structure where inputted left-eye images are 
allocated to a base layer and right-eye images are allocated to the enhancement layer. 
The left-eye images allocated to the base layer are encoded based on the conventional 
two-dimensional video encoding. The right-eye images allocated to the enhancement 
layer are encoded with reference to the image of the base layer, which is overlapped 
with that of the environment layer. 

Meanwhile, a web server 400 receives information on contents and a display mode 
reqiested by a client 600 through a back channel and transmits them to the streaming 
server 300. The streaming server 300 nxiltiplexes the ES of the content in the display 
mode reqaested by the client 400 to generate nultimedia data, e.g., a real-time 
transport protocol (RTP) packet, and transmits the nultimedia data to the client 600 
through the Internet. The client 600 decodes and displays the data in the transmitted 
order. To output nultimedia data, the client 600 should have a player with a codec 
necessarily. 

Eg. 3 is a block diagram showing an encoding server of fig. 2. As shown, the 
encoding server 200 includes an encoder 210, an encoding parameter unit 220, an a 
MPEG layer 4 (MP4) file generator 230 for generating an MP4 file by using encoded 
ES, and a storage 240 for storing the MP4 file. 

The encoding parameter unit 220 provides information for encoding the inputted 
stereoscopic video. It sets up parameters for encoding, such as a size of an image, the 
number of frames to be encoded, a frame rate, a size of motion search, a transmission 
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bit rate, and an initial cpantization coefficient, and inputs them to the encoder 210. 

The encoder 210 encodes the inputted stereoscopic video data and audio data based 
on the MPEG-4 TS and audio codec. Internal modules of the encoder 210 are il- 
lustrated in Hg. 4. 

Referring to Hg. 4, the encoder 210 includes a video encoding module 212 for 
encoding stereoscopic video data, an Elementary Stream Interface (ESI) information 
generating module 216, an audio encoding module 218 for encoding audio data, and an 
OD/BEFS encoding module 219 for encoding OD/BIES data. 

The OD/BIFS encoding module 219 encodes binary format for scene (BIFS) for 
describing audio and scenes and object descriptor (OD) for defining the relationship 
between media streams. 

The ESI information generating module 216 generates additional information 
needed for the transmission and decoding of ES, such as a data length of ES, an idle 
flag, and a length of access unit (AU), which are included in a header information of a 
synchronization layer (SL). The header information of SL will be described later. 

The video encoding module 212 further includes a field separating module 213, a 
base layer encoding module 214, and an enhancement layer encoding module 215. The 
field separating module 213 separates a stereoscopic three-dimensional video data into 
a left-eye odd field, a left-eye even field, a right-eye odd field, and a right-eye even 
field. The base layer encoding module 214 encodes the left-eye odd field, and the en- 
hancement layer encoding module 215 encodes the left-eye even field, right-eye odd 
field and right-eye even field. 

Eg. 5 is a diagram showing fields separated by the field separating module being 
inputted into each layer of an MPEG-4 structure in accordance with the preferred 
embodiment of the present invention. As shown, the left-eye odd field is inputted into 
the base layer; the left-eye even field into a first enhancement layer; the right-eye odd 
field into a second enhancement layer; and the right-eye even field into a third en- 
hancement layer. 

Eg. 6 is a block diagram illustrating an MP4 file generator of Eg. 3. As shown, the 
MP4 file generator 230 which receives video/audio ES, OD/BIFS ES and ESI in- 
formation from the encoder 210 includes a media data providing module 232, a 
metadata providing module 234 and an MP4 file generating module 236. 

The media data providing module 232 is a buffer for receiving video ES, audio ES 
and OD/BIFS ES, which are encoded on a field-by-field basis. It transmits the ES as to 
the MP4 file generating module 236. 
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The metadata providing module 234 is a buffer for receiving ESI information 
transmitted from the encoder 210, and transmitting the ESI information as a metadata 
to the MP4 file generating module 236. 

The MP4 file generating module 236 converts the inputted ES and the metadata into 
an MP4 file format. This is to generate and store a file of a format suitably for 
transmission by receiving ES outputted from the encoder and additional information 
for the ES, extracting ES in coincidence with the display mode revested by a user. 

An MP4 file has two zones: One is a metadata zone for storing file information, and 
the other is an mdata Atom zone for storing ES. The ES stored in the mdata Atom zone 
is given a proper ED identification ES JD to discriminate encoded ES. 

Hgs. 7 is an exemplary diagram illustrating an arrangement of ES in the mdata 
Atom for storing the media data, the ES being given four ES_ID based on the right and 
left odd and even fields. Eg. 8 is an exemplary diagram illustrating an arrangement of 
ES for stereoscopic video data in the mdata Atom by rrultiplexing four fields of the 
ES. The ES is inputted on a four-field basis, i.e., a left-eye odd field, a right-eye even 
field, a left-eye even field and a right-eye odd field. One ESJDD is allocated to four 
fields having the same time information. 

The MP4 file generated through the above processes is stored in a storage 240 and 
extracted by the streaming server 300. 

Eg. 9 is a block diagram illustrating a streaming server of Eg. 2. As shown, the 
streaming server 300 extracts MP4 files stored in the storage 240, or receives encoded 
ES and ESI information encoded by the encoder 210, generates a real-time transport 
protocol (RTP) packet that coincides with a user's request, and transmits it to a client 
600. 

In order to generate the RTP packet that coincides with the user's reqaest, a display 
mode requested by the user should be inputted into the streaming server 300. Ac- 
cordingly, the display mode requested by the user should be inputted from the client 
600 and a web server 400 and then transmitted to a streaming server 300. 

In the Internet broadcasting system of the present invention, a video data is encoded 
after divided into a left-eye odd field, a left-eye even field, a right-eye odd field and a 
right-eye even field. Therefore, conventional two-dimensional video data, field- 
shuttering three-dimensional video data or frame-shuttering three-dimensional video 
data can be all processed in this system. 

Ibr example, if a user wants the conventional two-dimensional video display, the 
streaming server 300 transmits a stream of the left-eye odd field and the left-eye even 
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field. If the user wants field-shuttering three-dimensional video display, it extracts and 
transnits a stream of the left-eye odd field and the right-eye even field. Likewise, if 
the user wants frame-shuttering three-dimensional display, it transnits a stream of all 
of the four fields. 

If the user's request on the display mode is inputted into a MP4 file analyzing 
module 310 through the web server 400, the MP4 file analyzing module 310 extracts a 
needed AU stream and ESI information from the MP4 files stored in the storage 240. 
Here, the MP4 file analyzing module 310 can receive the AU stream and the ESI in- 
formation from the encoder 210 in real-time. 

When the MP4 file analyzing module 310 extracts the AU stream and the ESI in- 
formation based on the request of the user, a SL packet generating module 320 
generates an SL packet having a header and a payload for the extracted AU stream. 
The header of the SL packet is synchronization information for each packet and it is 
used to check continuity when data loss occurs. The header includes information for 
controlling time synchronization, such as time stamp. The payload of the SL packet is 
valid information that comes after the header. The payload includes the AU stream 
extracted by the MP4 file analyzing module 310. 

The generated SL packet is inputted into a FlexMux packet generating module 330 
FlexMux, and the FlexMux packet generating module 330 generates a FlexMux packet 
by adding a header that defines a packet type to the SL packet. The packet type means 
information for distinguishing video data from audio data. 

The generated FlexMux packet is inputted into an RTP packet generating module 
340. Then, the RTP packet generating module 340 generates an RTP packet that could 
be transmitted through the Internet in real-time. 

The RTP packet is a protocol packet of a transport layer that makes it possible to 
transmit data on the Internet in real-time. The RTP packet can be generated by adding 
a header including information for real-time data transmission to a FlexMux packet. 

fig. 10 is a diagram depicting a packing transformation process in the streaming 
server. The RTP packet generated in the above is transmitted to a client 600 through 
the Internet in real-time, and a player mounted on the client 600 decodes the RTP 
packet and displays it. 

If the packet is a field-shuttering three-dimensional video RTP packet, the player 
can produce three-dimensional distance effect by outputting a stream of left-eye odd 
field and a stream of right-eye even field in the transmitted order, instead of dis- 
criminating between left-eye odd field stream and right-eye even field stream and syn- 
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chronizing their time with each other and output them. In short, since the RTP packet 
multiplexed by the streaming server 300 is packetized in the order of necessary field 
streams based on the display mode reqiested by the user, the client 600 can output 
stereoscopic video data without an additional data processing. 

[54] The Internet broadcasting system and method of the present invention can reduce 

the amount of data considerably by encoding a stereoscopic video data effectively, 
thus reducing the probability of transmission error occurrence. Therefore, it is possible 
to broadcast stereoscopic videos on the Internet in real-time. 

[55] In addition, the Internet broadcasting system of the present invention can restore not 

only stereoscopic videos but also conventional two-dimensional videos based on the 
display mode recjiested by the user. 

[56] While the present invention has been described with respect to certain preferred 

embodiments, it will be apparent to those skilled in the art that various changes and 
modifications may be made without departing from the scope of the invention as 
defined in the following claims. 
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Claims 

[1] A system for broadcasting MPEG-4-based stereoscopic video data on the 

Internet, comprising: 

an encoding server for encoding stereoscopic video data, audio data, and Object 
Descriptor/Binary Ibrmat for Scene (OD/BIFS) which is information for 
controlling a content, into elementary stream (ES) having a Moving Picture 
Experts Group (MPEG)-4 structure; 

a Web server for receiving from the client any one among two-dimensional 
video display mode, field-shuttering video display mode and frame-shuttering 
video display mode; and 

a streaming server for generating a real-time transport protocol (RTP) packet for 
real-time data transmission on the Internet by rmltiplexing the ES based on the 
display mode inputted into the web server, and transmitting the RTP packet to 
the client. 

[2] The system as recited in claim 1, wherein the encoding server includes: 

an encoding unit for encoding the stereoscopic video data, the audio data and the 

OD/BIFS into ES having a structure of MPEG-4 temporal scalability (TS); 

an encoding parameter unit for providing encoding information having a size of 

an image and the number of frames to be encoded, to the encoding unit; 

an MPEG 4 (MP4) file generating unit for generating an MP4 file by adding 

metadata to the ES; and 

a storage for storing the MP4 file. 

[3] The system as recited in claim 2, wherein the encoding unit includes: 

an OD/BIFS encoding module for encoding the OD/BIFS data; 
an audio encoding module for encoding the audio data; 
a video encoding module for encoding the stereoscopic video data; and 
an Elementary Stream Interface (ESI) information generating module for 
generating additional information needed for the transmission and decoding of 
the ES. 

[4] The system as recited in claim 3, wherein the video encoding module includes: 

a field separating module for separating the stereoscopic video data into a left- 
eye odd field, a left-eye even field, a right-eye odd field and a right-eye even 
field; 

a base layer encoding module for encoding the left-eye odd field; and 
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an enhancement encoding module for encoding the left-eye even field, the right- 
eye odd field and the right-eye even field. 

[5] The system as recited in claim 4, wherein the enhancement encoding module 

allocates the left-eye even field to a first enhancement layer; the right-eye odd 
field to a second enhancement layer, and the right-eye even field to a third en- 
hancement layer, and encodes the left-eye even field, the right-eye odd field and 
the right-eye even field based on the MPEG-4 TS structure. 

[6] The system as recited in claim 4, wherein the MP4 file generating unit generates 

an MP4 file by giving one ES identification (ES_ID) to a set of a left-eye odd 
field, a left-eye even field, a right-eye odd field and a right-eye even field in the 
ES. 

[71 The system as recited in claim 4, wherein if a display mode inputted from the 

web server is a two-dimensional video display mode, the streaming server 
transmits an ES of a left-eye odd field and a left-eye even field to the client; 
if the display mode inputted from the web server is a field-shuttering display 
mode, the streaming server nultiplexes an ES of the left-eye odd field and the 
right-eye even field sequentially and transmits the ES to the client; and 
if the display mode inputted from the web server is a frame-shuttering display 
mode, the streaming server nultiplexes an ES having the left-eye odd field, left- 
eye even field, right-eye odd field and the right-eye even field fields seqaentially 
and transmits the ES to the client. 

[8] A method for broadcasting stereoscopic video data to a client on the Internet 

based on MPEG-4, comprising the steps of: 

a) encoding stereoscopic video data, audio data, and Object Descriptor/Binary 
Ibrmat for Scene (OD/BIR5) which is information for controlling a content into 
ES having an MPEG-4 structure; 

b) receiving any one among two-dimensional video display mode, field- 
shuttering video display mode and frame-shuttering video display mode from the 
client; and 

c) generating an RTP packet for real-time transmission on the Internet by irul- 
tiplexing the ES based on the inputted display mode, and transmitting the RTP 
packet to the client. 

[9] The method as recited in claim 8, wherein the step a) includes the steps of: 

al) encoding the stereoscopic video data into ES having a structure of MPEG-4 
TS; 
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a2) generating an MP4 file by adding metadata to the ES; and 
a3) storing the MP4 file in a storage. 
[10] The method as recited in claim 9, wherein the step al) includes the steps of: 

al-1) encoding the OD/BIFS data; 
a 1-2) encoding the audio data; 
al-3) encoding the stereoscopic video data; and 
al-4) generating additional information needed for the transmission and 
decoding of the ESs. 

[1 1] The method as recited in claim 10, wherein the step al-3) includes the steps of: 

al-3a) separating the stereoscopic video data into a left-eye odd field, a left-eye 
even field, a right-eye odd field and a right-eye even field; 
al-3b) encoding the left-eye odd field; and 

al-3c) encoding the left-eye even field, the right-eye odd field and the right-eye 
even field. 

[12] The method as recited in claim 1 1, wherein, at the step al-3c), the left-eye even 

field is allocated to a first enhancement layer; the right-eye odd field is allocated 
to a second enhancement layer; and the right-eye even field is allocated to a third 
enhancement layer; and the left-eye even field, the right-eye odd field and the 
right-eye even field are encoded based on the MPEG-4 TS structure. 

[13] The method as recited in claim 11, wherein, at the step al-3c), an MP4 file is 

generated by giving one ES_ID to a set of a left-eye odd field, a left-eye even 
field, a right-eye odd field and a right-eye even field in the ES. 

[14] The method as recited in claim 1 1, wherein, at the step c), if a display mode 

inputted from the web server is a two-dimensional video display mode, an ES 
having a left-eye odd field and a left-eye even field is transmitted to the client; 
if the display mode inputted from the web server is a field-shuttering display 
mode, an ES having the left-eye odd field and the right-eye even field is 
miltiplexed secjientiaily and transmitted to the client; and 
if the display mode inputted from the web server is a frame-shuttering display 
mode, an ES having the left-eye odd field, left-eye even field, right-eye odd field 
and the right-eye even field is miltiplexed seqientially and transmitted to the 
client. 
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[Fig. 5] 
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[Fig. 9] 
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